motion blur
A Benchmark Dataset for Event-Guided Human Pose Estimation and Tracking in Extreme Conditions
Multi-person pose estimation and tracking have been actively researched by the computer vision community due to their practical applicability. However, existing human pose estimation and tracking datasets have only been successful in typical scenarios, such as those without motion blur or with well-lit conditions.
- Asia > Singapore (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- North America > United States (0.04)
- (2 more...)
A Benchmark Dataset for Event-Guided Human Pose Estimation and Tracking in Extreme Conditions
Multi-person pose estimation and tracking have been actively researched by the computer vision community due to their practical applicability. However, existing human pose estimation and tracking datasets have only been successful in typical scenarios, such as those without motion blur or with well-lit conditions. These RGB-based datasets are limited to learning under extreme motion blur situations or poor lighting conditions, making them inherently vulnerable to such scenarios.As a promising solution, bio-inspired event cameras exhibit robustness in extreme scenarios due to their high dynamic range and micro-second level temporal resolution. Therefore, in this paper, we introduce a new hybrid dataset encompassing both RGB and event data for human pose estimation and tracking in two extreme scenarios: low-light and motion blur environments. The proposed Event-guided Human Pose Estimation and Tracking in eXtreme Conditions (EHPT-XC) dataset covers cases of motion blur caused by dynamic objects and low-light conditions individually as well as both simultaneously. With EHPT-XC, we aim to inspire researchers to tackle pose estimation and tracking in extreme conditions by leveraging the advantageous of the event camera.
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation
Cheng, Shihan, Kulkarni, Nilesh, Hyde, David, Smirnov, Dmitriy
Fine-tuning large-scale text-to-video diffusion models to add new generative controls, such as those over physical camera parameters (e.g., shutter speed or aperture), typically requires vast, high-fidelity datasets that are difficult to acquire. In this work, we propose a data-efficient fine-tuning strategy that learns these controls from sparse, low-quality synthetic data. W e show that not only does fine-tuning on such simple data enable the desired controls, it actually yields superior results to models fine-tuned on pho-torealistic "real" data. Beyond demonstrating these results, we provide a framework that justifies this phenomenon both intuitively and quantitatively.
- Asia (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
FMA-Net++: Motion- and Exposure-Aware Real-World Joint Video Super-Resolution and Deblurring
Youk, Geunhyuk, Oh, Jihyong, Kim, Munchurl
Real-world video restoration is plagued by complex degradations from motion coupled with dynamically varying exposure - a key challenge largely overlooked by prior works and a common artifact of auto-exposure or low-light capture. We present FMA-Net++, a framework for joint video super-resolution and deblurring that explicitly models this coupled effect of motion and dynamically varying exposure. FMA-Net++ adopts a sequence-level architecture built from Hierarchical Refinement with Bidirectional Propagation blocks, enabling parallel, long-range temporal modeling. Within each block, an Exposure Time-aware Modulation layer conditions features on per-frame exposure, which in turn drives an exposure-aware Flow-Guided Dynamic Filtering module to infer motion- and exposure-aware degradation kernels. FMA-Net++ decouples degradation learning from restoration: the former predicts exposure- and motion-aware priors to guide the latter, improving both accuracy and efficiency. To evaluate under realistic capture conditions, we introduce REDS-ME (multi-exposure) and REDS-RE (random-exposure) benchmarks. Trained solely on synthetic data, FMA-Net++ achieves state-of-the-art accuracy and temporal consistency on our new benchmarks and GoPro, outperforming recent methods in both restoration quality and inference speed, and generalizes well to challenging real-world videos.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.97)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
DINO-Detect: A Simple yet Effective Framework for Blur-Robust AI-Generated Image Detection
Shen, Jialiang, Zheng, Jiyang, Xue, Yunqi, Chen, Huajie, Yao, Yu, Kang, Hui, Liu, Ruiqi, Gong, Helin, Yang, Yang, Wang, Dadong, Liu, Tongliang
With growing concerns over image authenticity and digital safety, the field of AI-generated image (AIGI) detection has progressed rapidly. Y et, most AIGI detectors still struggle under real-world degradations, particularly motion blur, which frequently occurs in handheld photography, fast motion, and compressed video. Such blur distorts fine textures and suppresses high-frequency artifacts, causing severe performance drops in real-world settings. W e address this limitation with a blur-robust AIGI detection framework based on teacher-student knowledge distillation. A high-capacity teacher (DINOv3), trained on clean (i.e., sharp) images, provides stable and semantically rich representations that serve as a reference for learning. By freezing the teacher to maintain its generalization ability, we distill its feature and logit responses from sharp images to a student trained on blurred counterparts, enabling the student to produce consistent representations under motion degradation. Extensive experiments benchmarks show that our method achieves state-of-the-art performance under both motion-blurred and clean conditions, demonstrating improved generalization and real-world applicability. Source codes will be released at: Project Page.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > Macao (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Research Report (0.50)
- Overview (0.48)
- Education (1.00)
- Media > Photography (0.48)
Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration
Rashid, Umar, Arshad, Muhammad Arslan, Ahmad, Ghulam, Anjum, Muhammad Zeeshan, Khan, Rizwan, Akmal, Muhammad
Motion blur in scene text images severely impairs readability and hinders the reliability of computer vision tasks, including autonomous driving, document digitization, and visual information retrieval. Conventional deblurring approaches are often inadequate in handling spatially varying blur and typically fall short in modeling the long-range dependencies necessary for restoring textual clarity. To overcome these limitations, we introduce a hybrid deep learning framework that combines convolutional neural networks (CNNs) with vision transformers (ViTs), thereby leveraging both local feature extraction and global contextual reasoning. The architecture employs a CNN-based encoder-decoder to preserve structural details, while a transformer module enhances global awareness through self-attention. Training is conducted on a curated dataset derived from TextOCR, where sharp scene-text samples are paired with synthetically blurred versions generated using realistic motion-blur kernels of multiple sizes and orientations. Model optimization is guided by a composite loss that incorporates mean absolute error (MAE), squared error (MSE), perceptual similarity, and structural similarity (SSIM). Quantitative evaluations show that the proposed method attains 32.20 dB in PSNR and 0.934 in SSIM, while remaining lightweight with 2.83 million parameters and an average inference time of 61 ms. These results highlight the effectiveness and computational efficiency of the CNN-ViT hybrid design, establishing its practicality for real-world motion-blurred scene-text restoration.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (9 more...)
A Benchmark Dataset for Event-Guided Human Pose Estimation and Tracking in Extreme Conditions
Multi-person pose estimation and tracking have been actively researched by the computer vision community due to their practical applicability. However, existing human pose estimation and tracking datasets have only been successful in typical scenarios, such as those without motion blur or with well-lit conditions.
Lei Ma
The state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples with additive random noise-like perturbations. While such examples are hardly found in the physical world, the image blurring effect caused by object motion, on the other hand, commonly occurs in practice, making the study of which greatly important especially for the widely adopted real-time image processing tasks ( e.g ., object detection, tracking). In this paper, we initiate the first step to comprehensively investigate the potential hazards of blur effect for DNN, caused by object motion. We propose a novel adversarial attack method that can generate visually natural motion-blurred adversarial examples, named motion-based adversarial blur attack (AB BA).
- Asia > Singapore (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- North America > United States (0.04)
- (2 more...)